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1  Introduction 

Multichip  modules  (MCMs)  incorporate  several  microcircuits  into  a  single  package,  offering 
the  potential  to  increase  circuit  density  to  over  100  chips  per  module  package.  This  technol¬ 
ogy  was  first  proposed  in  the  Wafer  Scale  Integration  (WSI)  Program  at  Rome  Laboratory. 
There  are  now  commercial  vendors,  including  Mercury  Supercomputer  Systems,  Inc.,  Intel 
Corporation  and  Honeywell  Incorporated,  developing  and  supplying  MCMs.  There  are  varied 
implementations  of  WSI,  but  with  the  same  goal  of  increasing  the  electronic  circuit  density, 
the  reliability  and  the  module  performance.  While  MCMs  hold  potential  for  delivering  the 
computational  density  required  for  timely  processing  of  large  amounts  of  data,  the  reliability 
assessment  of  MCMs  is  a  challenging  task  due  to  the  high  complexity  [1]  of  modeling  the 
behavior  in  inhomogeneous  multi-material  media.  Material  interfaces  are  a  major  source  of 
singularities.  Inherent  complexity,  high  unit  cost  and  low  production  levels  mahe  traditional 
statistical  based  methods  of  testing  impractical.  The  solution  is  to  develop  the  reliability 
assessment  process  during  the  design  stages.  An  important  component  of  this  effort  is  the 
finite  element  thermal  and  stress  analysis  of  the  multichip  design.  In  fact,  this  is  the  critical 
component  in  terms  of  speed  and  accuracy.  Since  flaws  in  a  design  tend  to  occur  at  material 
interfaces,  the  accurate  resolution  of  the  thermal  diffusion  and  induced  stress  effects  at  these 
singular  interfaces  is  of  great  importance. 

We  have  carried  out  a  combined  research  and  implementation  project  to  evaluate  the 
feasibility  of  our  solution  and  implementation  methodologies  for  creating  a  software  tool 
for  analyzing  possible  failures  in  the  design  stages  of  MCMs.  One  result  of  this  effort  is 
the  development  of  a  computational  technology  base  for  the  solution  of  the  initial  value 
problem  for  the  2-dimensional  heat  diffusion  equation  in  a  finite,  composite  multimaterial 
region.  Within  this  computational  framework,  we  have  also  developed  and  implemented 
3-dimensional  computational  cores  based  on  FFT  routines  incorporating  global  data  sym¬ 
metries.  Implementations  were  made  on  single  processor  Pentium,  4  parallel  processor  i860 
and  the  multichip  P6. 

We  have  made  the  following  technology  transfers. 

•  Dr.  John  Hines  of  Wright  Laboratory  has  agreed  to  beta  test  our  existing  and  future 
codes  for  the  purpose  of  insuring  relevance  and  usability. 

•  We  gave  an  invited  presentation  on  our  work  at  the  University  of  North  Carolina 
conference  on  “Wavelets,  relations  with  operators  and  applications,”  held  24-28  July. 

•  Visited  engineers  (D.  Holzhauer  et  al)  of  Rome  Laboratory  at  Griffiss  AFB.  We  dis¬ 
cussed  the  capabilities  and  limitations  of  the  existing  software,  IMCMA  (Intelligent 
Multichip  Module  Analyzer),  developed  at  Griffiss  AFB.  We  have  thus  established  a 
working  relationship  with  Mr.  D.  Holzhauer  et  al.  The  significance  of  this  relation¬ 
ship  was  two  fold:  IMCMA  provides  validation  of  our  methods  and  codes;  IMCMA  is 
unique  in  its  capability  to  address  the  currently  most  important  geometry  of  flip  chip 
configuration. 

•  Presented  our  results  to  scientists  (Mr.  Lyke  et  al)  at  Phillips  Laboratory,  Kirtland 
AFB. 
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The  persons  supported  by  and  contributed  to  this  project  are  Drs.  M  An,  J.  Weiss  and 
J.  Byrnes  of  Prometheus  and  Professor  R.  Tolimieri  and  Mr.  D.  Wahl  of  CCNY/CUNY. 

2  Solving  the  Heat  Diffusion  Equation. 

We  have  used  an  adaptation  of  finite  element  analysis  (FEA)  procedure  incorporating  wavelet 
bases.  Being  both  compactly  supported  and  orthogonal,  wavelets  combine  the  advantages  of 
finite  element,  splines  and  Fourier  spectral  methods.  Due  to  active  research  in  the  theory  of 
wavelets,  there  now  exist  methods  for  exact  evaluation  of  functionals  of  compactly  supported 
wavelets  required  for  correct  application  of  the  Galerkin  procedures. 

The  time-dependent,  heat  diffusion  equation, 

( A  -  V  •  V  +d')i’  =  P, 

describes  the  evolution  of  the  temperature  V’  in  some  region  of  interest.  For  multicomponent 
systems,  k  and  d  are,  in  general,  discontinuous  at  the  interfaces  of  components.  The  (internal) 
boundary  conditions  at  an  interface  are  [3] 

At  an  interface  there  is  a  jump  in  temperature  V*  and  its  normal  derivative,  V’n- 


2.1  Compactly  supported  wavelets 

To  describe  the  class  of  compactly  supported  wavelets  briefly,  let  ^  be  a  solution  of  the 
scaling  relation 

7V-1 

=  12  -  k). 

fc=0 

The  ak  are  a  collection  of  coefficients  that  categorize  the  specific  wavelet  basis.  The  expres¬ 
sion  (f>  is  called  the  scaling  function. 

The  translates  of  (j)  are  required  to  be  orthonormal 

J  <l>{x  -  k)4>{x  -  m)  =  6k, m- 

From  the  scaling  relation  this  imphes  the  condition 

N 

^  2m  ~  ^Om* 

k=0 

For  coefficients  satisfying  the  above  two  conditions,  the  functions  consisting  of  translates 
and  dilations  of  the  scaling  function,  (j){2^x  —  k),  form  a  basis  for  square  integrable  functions 
on  the  real  line. 

If  only  a  finite  number  of  the  ak  are  nonzero  then  <f)  has  compact  support. 

The  compactly  supported  wavelet  is  defined  by  the  equation 

i’i^)  =  12i-'^f(^i-k<l>{^x  -  k). 

The  translates  of  the  scaling  function  and  wavelet  define  orthogonal  subspaces,  i.e. 

j  (j){x)tl;{x  -  m)dx  =  '^{-ifai-kak-im  =  0. 
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2.2  The  wavelet-Galerkin  method 

For  a  PD  E  of  the  form 

define  the  wavelet  expansion 

U^'£Uk4>{x-k). 

An  approximation  to  the  solution  is  defined  by 

0=^2  UkHx-k). 

k=-M 

Thus  the  solution  is  projected  onto  the  subspace  spanned  by 

$(M,iV)  =  {(^(x-A:)  :  fc  = 


Herein  and  in  what  follows,  we  assume,  for  simplicity  and  without  loss  of  generality,  that  the 
dilation  factor  2^  has  been  normalized  to  1  by  a  scale  transformation  y  =  2^ x.  In  effect,  the 
integers  are  the  finest  scale.  To  determine  the  coefficients  of  this  expansion  we  substitute 
into  the  equation  and  again  project  the  resulting  expression  onto  the  subspace  $(M,  iV), 
This  uniquely  determines  the  coefficients  Uk- 

The  projection  requires  Uk  to  satisfy  the  equations 


/: 


for  k  =  — M, 
of  the  form 


(f){x  —  k)F{U,  Uti  Uxi  •  •  ■)dx  =  0 

,  N.  To  evaluate  this  expression  we  must  know  the  connection  coefficients 

(f){x)<j)x{x  —  ki)'’  •  (l>xxix  —  k2)  -  ••  dx. 

A  typical  functional  (three  term  connection  coefficient)  is 

^{k,j)  =  j  (l>xx{x)(l>x{x  —  k)4>{x  -  j)dx. 

Since  the  scaling  function  used  to  define  compact  wavelets  has  a  limited  number  of  deriva¬ 
tives,  the  numerical  evaluation  of  these  expressions  is  often  unstable  or  inaccurate.  We  have 
found  methods  for  evaluating  the  functionals  exactly  based  on  the  scaling  relation 

N 

=  '^o.k<i>{2x  -  k). 
fc=0 


By  straightforward  manipulations,  a  system  of  equations  is  found  for  the  n{k,j).  The  system 
of  equations  is  generally  rank  deficient  (singular).  The  rank  deficiency  is  cured  and  a  unique 
solution  is  obtained  by  the  inclusion  of  an  additional  set  of  linear  equations  that  are  obtained 
from  the  moment  equations.  The  resulting  system  is  non-singular  and  non-homogeneous  and 
has  a  unique  solution  that  is  easily  found  by  standard  techniques.  One  of  these  techniques 
is  derived  in  [4]. 
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2.3  Domain  decomposition 

The  wavelet- Galerkin  and  capacitance  matrix  methods  for  domain  decomposition  have  been 
carried  out  by  an  implicit  time  differencing  and  wavelet-Galerkin  discretization  with  the 
capacitance  rtiatrix  method  to  impose  boundary  conditions. 

For  the  wavelet-Galerkin  method  we  expand  k,  d,  and  V’’"  in  scaling  function  expansions 

^  =  XI -  j)^ 


=  X  X  -  j), 

^=xx  -  j)4>{y  -  k) 

and  apply  the  Galerkin  procedure  to  determine  the  coefficients  (V’i.fc)-  To  resolve  the  dis¬ 
continuities  of  k  and  d  we  use  the  wavelet  transform  to  find  the  smooth  (large  scale)  and 
discontinuous  (small  scale)  parts 

k  —  kg  -|-  k{ 
d  =  dg  df. 

In  effect,  we  expand  {k,d)  into  scaling  function  and  wavelet  components.  The  discontinuous 
(wavelet)  {ki,di)  are  localized  in  a  neighborhood  of  the  interface.  This  process  decomposes 
the  diffusion  operator  into  smooth  and  discontinuous  parts 


L  =  •  k\/  +d  =  Lg  +  Li. 


With  this  representation  the  implicit  time  differencing  is  defined  as 

(/  -  dtLg)lpn+l  -{1  +  dtLg)lj)n  +  dtT,(V’n-|-l  +  V’n)/2  +  P 
and  solved  by  iteration  at  each  time  step. 

Implicit  time  differencing  is  used  to  time  advance  the  Galerkin  coefficients.  The  internal 
boundaries  cause  singularities  that  are  represented  by  a  capacitance  matrix  term  defined 
at  the  boundary.  This  is  similar  to  the  approach  of  [3]  where  a  fictitious  layer  is  used  to 
represent  imperfect  contact  at  interface.  Therefore,  the  interfaces  are  approximated  as  sin¬ 
gularities  using  the  wavelet-Galerkin  approximation  of  the  Green’s  function,  desingularizing 
the  approximation. 


3  Digital  Implementations 

Formulating  the  implicit  time  differencing  as  a  convolutional  product,  solution  is  found  as 
the  Fourier  transform  of  the  diffusion  operator.  By  the  convolution  theorem,  the  convolution 
product  is  computed  as  the  pointwise  product  of  the  Fourier  transforms.  This  requires  the 
computation  of  the  2-dimensional  Fourier  transform  and  its  inverse  for  every  iteration  at  each 
time  step.  Thus,  over  90%  of  the  computational  burden  is  carried  out  by  the  2-dimensional 
FFT. 
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3.1  Optimized  FFT  routines 

There  exists  optimized  FFT  codes  on  wide  ranges  of  hardware  architectures.  In  fact,  we 
have  been  developing  and  implementing  many  of  these  codes  on  distributed  and  parallel 
architectures  based  on  special  and  general  purpose  digital  signal  processing  chips  and  reduced 
instruction  set  computers.  The  availability  of  optimized  FFT  codes  which  act  on  highly 
flexible  input  and  output  data  structures  was  of  critical  importance  for  efficient  interfacing 
of  the  Fourier  transform  with  current  modeling  strategies. 

The  2-dimensional  N  x  N  DFT  is  defined  by 

2/(ni,n2)  =  0  <  nr,n2,h,k2  <  N. 

A:i=0  ^2=0 

Both  the  diffusion  operator  and  data  at  each  time  step  are  real  valued.  Thus  the  memory 
(or  cache)  required  is  exactly  1/2  that  of  complex  data.  Algorithm  is  given  to  show  that  the 
required  computation  is  also  roughly  1/2. 

Algorithm  for  real  FT 

•  Define  a  complex  2-dimensional  array  u  of  size  N  x  {N/2  -f  1)  from  x{ki,k2)  by 

u{ki,k2)  =  x{ki,2k2)  +  ix{ki,2k2  +  1),  0  <  ki,k2  <  N/2. 

•  Compute  the  A-point  FT  along  the  first  dimension  of  u. 

•  Matrix  transpose  the  result  using  the  intermediate  symmetry  to  the  array  v  of  size 
N  X  {N/2  -f  1)  from  u. 

•  Compute  the  A^-point  FT  along  the  second  dimension. 

•  Define  a  complex  2-dimensional  array  y  by 

y{ni,2*n2)  =  1/2  (u(ni,  n2) -h  u*(A  -  ni,  A  -  712)) , 
y{ni,2*n2  +  l)  -  -i/2{v{ni,n2)  -  v*{N  -  ni,N  -  112)) , 

0  <  ni  <  A,  0  <  n2  <  A/2. 

This  algorithm  as  well  the  one  below  were  implemented  with  minimum  of  2-fold  speed 
up  over  complex-to-complex  FFT  routines:  smaller  size  data  more  than  made  up  for  the 
overhead  in  incorporating  the  symmetry.  For  computation  sizes  that  are  well  over  the  cache 
of  the  machines,  this  routine  performs  many  times  (up  to  20)  faster  than  complex-to-complex 
routines  by  avoiding  memory  swapping. 

For  the  inverse  FT,  we  observed  that  the  Fourier  transform  of  a  real  sequence  is  Hermitian 
symmetric,  i.e., 

y{N  -  ni,  A  -  712)  =  y*{ni,n2),  0  <  ni,n2  <  A. 

Thus  again,  the  required  memory  and,  as  we  will  see,  computation  required  are  roughly  1/2. 


Algorithm  for  Hermitian  FT 
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•  Compute  the  A^-point  inverse  FT  along  the  first  dimension  of  y. 

•  Define  a  complex  2-dimensional  array  u  of  size  N  x  {N/2  d-  1)  from  the  transpose  of 
y{ki,ri2)  by 

u{ki,n2)  =  y{2*ki,n2)  +  iy{2*ki  +  l,n2), 
u{ki,N-n2)  -  t/*(2  * /ji,  722)  +  *j/*(2  *  fci  +  1,712), 

0  <  A:i  <  N/2,  0  <  712  <  N/2. 

•  Compute  the  iV-point  FT  along  the  second  dimension  of  u. 

•  Extract  the  desired  result  by 

x{2*ki,k2)  =  real{u{ki,k2)) 
x(2  ♦  A: -t- 1,  A:2)  =  imag{u{k\,k2)). 

Similar  algorithms  were  developed  and  implemented  for  90°-rotational  symmetry  with 
comparable  performances. 

4  Results  and  Conclusions 

The  goal  of  the  proposed  software  tool  is  computational  efficiency  without  trading  off 
numerical  accuracy.  The  critical  issue  determining  the  applicability  of  the  software  tool 
is  portability  and  scalability  of  its  implementation  so  as  to  interface  with  already  existing 
technology.  To  this  end,  we  have  investigated  and  concluded  that 

•  the  wavelet-based  numerical  methods  are  effective. 

•  incorporation  of  symmetry  is  effective  in  terms  of  accuracy,  computational  complexity 
and  memory  requirements. 

•  computational  and  communication  complexity  of  our  numerical  methods  for  optimal 
implementation  is  feasible. 

•  many  hardware  platforms  are  feasible  for  carrying  out  the  required  computations. 

Our  conclusion  is  based  on  the  following. 

•  We  have 

Designed,  implemented  and  tested  the  wavelet- Galerkin  solver  for  the  time-dependent, 
two-dimensional  and  composite,  heat  equation 

Developed  and  applied  an  adaptive  implicit  time  differencing  for  the  time- dependent, 
two-dimensional  and  composite,  heat  equation  that  imposes  a  monotone  convergence. 

Developed  a  new  variant  of  a  higher  order,  implicit  Runge-Kutta  time  differencing  that 
converges  faster  and  with  a  time  step  that  is  two  to  three  larger  than  the  implicit  Euler 
time  differencing  used  previously. 
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Coded  and  optimized  an  alternate  formulation  of  the  wavelet-Galerkin  solver  for  the 
time-dependent,  two-dimensional  and  composite,  heat  equation  that  allows  fast  (FFT 
based)  algorithms  through  a  consistent  wavelet-Galerkin  operator  splitting. 

We  have  found  that 

-  Continuous  adjustment  of  the  time  step  can  control  the  (iterative)  implicit  time 
step  error  to  be  at  a  prescribed  level. 

—  For  a  piecewise  constant  coefficient  of  diffusion  there  exists  a  fast  (FFT  based) 
implementation  of  the  full  wavelet-Galerkin  method.  The  timings  are;  the  full 
wavelet-Galerkin  evaluation  procedure  is  95%  of  the  program  CPU  run  time.  By 
fully  taking  into  account  the  local  tensor  product  and  piecewise  constant  structure 
of  the  coefficient  of  diffusion,  the  cost  of  the  exact  wavelet-Galerkin  evaluation 
can  be  reduced  to  about  5%  of  the  program  CPU  run  time  (for  the  non-FFT 
evaluations). 

—  Variations  of  the  time  stepping  method  can  increase  the  time  step  size  and  improve 
convergence  for  implicit  time  differencing  schemes. 

•  We  have  applied  optimized,  multidimensional,  parallel  FFT  algorithms  to  implement 
the  convolutions  and  deconvolutions  required  by  the  wavelet-Galerkin  methods. 

Implemented  2-dimensional  real  convolution  kernels  based  on  higher  radix  FFT  incor¬ 
porating  global  symmetries.  Performance  of  the  kernel  is  minimally  2  fold  but  often 
much  more  than  that  of  radix  2  FFT. 

Designed  3-dimensional  real  convolutional  kernels  that  can  incorporate  symmetries  for 
reducing  computation  and  memory  requirements.  These  convolutional  kernels  are  also 
denser  than  powers  of  two. 

•  We  have  identified  computational  cores  that  have  been  optimally  implemented  as  well 
as  those  which  require  optimal  implementation. 

Denser  than  powers  of  two  FFT  routines  must  be  implemented.  The  density  will 
be  most  important  to  keeping  the  problem  sizes  to  be  minimal  required  in  higher 
dimensions. 

We  have  formulated  a  vector-radix  multidimensional  FFT  algorithm  [2,  5]  for  better 
adapting  the  mesh  data  decomposition  structure.  Mesh  data  decomposition  is  a  more 
natural  and  efficient  domain  decomposition  method  for  parallelizing  our  code.  Com¬ 
patibility  of  the  vector-radix  algorithm  with  various  possible  symmetries  was  one  of 
the  parameters  in  our  formulation. 

•  We  have  begun  the  development  of  multichip  module  benchmarks  for  differing  input 
geometries. 

This  benchmark  was  tested  for  a  two-component  system  separated  by  a  narrow  gap 
on  a  low  diffusion  background.  The  calculation  is  nearly  as  fast  as  that  for  the  single 
component  system  done  previously. 
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